Bayesian Statistical Analysis of Bacterial Diversity

نویسندگان

  • Jing Tang
  • Jukka Corander
چکیده

Bacteria play an important role in many ecological systems. The molecular characterization of bacteria using either cultivation-dependent or cultivation-independent methods reveals the large scale of bacterial diversity in natural communities, and the vastness of populations within a species or genus. Understanding how bacterial diversity varies across different environments and also within populations should provide insights into many important questions of bacterial evolution and population dynamics. This thesis presents novel statistical methods for analyzing bacterial diversity using widely employed molecular fingerprinting techniques. The first objective of this thesis was to develop Bayesian clustering models to identify bacterial population structures. Bacterial isolates were identified using multilous sequence typing (MLST), and Bayesian clustering models were used to explore the evolutionary relationships among isolates. Our method involves the inference of genetic population structures via an unsupervised clustering framework where the dependence between loci is represented using graphical models. The population dynamics that generate such a population stratification were investigated using a stochastic model, in which homologous recombination between populations can be quantified within a gene flow network. The second part of the thesis focuses on cluster analysis of community compositional data produced by two different cultivation-independent analyses: terminal restriction fragment length polymorphism (T-RFLP) analysis, and fatty acid methyl ester (FAME) analysis. The cluster analysis aims to group bacterial communities that are similar in composition, which is an important step for understanding the overall influences of environmental and ecological perturbations on bacterial diversity. A common feature of T-RFLP and FAME data is zero-inflation, which indicates that the observation of a zero value is much more frequent than would be expected, for example, from a Poisson distribution in the discrete case, or a Gaussian distribution in the continuous case. We provided two strategies for modeling zero-inflation in the clustering framework, which were validated by both synthetic and empirical complex data sets. We show in the thesis that our model that takes into account dependencies between loci in MLST data can produce better clustering results than those methods which assume independent loci. Furthermore, computer algorithms that are efficient in analyzing large scale data were adopted for meeting the increasing computational need. Our method that detects homologous recombination in populations may provide a theoretical criterion for defining bacterial species. The clustering of bacterial community data include T-RFLP and FAME provides an initial effort for discovering the evolutionary dynamics that structure and maintain bacterial diversity in the natural environment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genetic analysis of castor (Ricinus communis L.) using ISSR markers

Castor (Ricinus communis L.) is one of the most ancient medicinal oil crops in the world. It has been vastly distributed in different parts of Iran. In the present study, the inter simple sequence repeat (ISSR) markers were used to evaluate the molecular genetic diversity among and within 12 castor accessions collected from 7 regions of Iran. Totally, 16 ISSR primers amplified 166 loci...

متن کامل

Bayesian and Iterative Maximum Likelihood Estimation of the Coefficients in Logistic Regression Analysis with Linked Data

This paper considers logistic regression analysis with linked data. It is shown that, in logistic regression analysis with linked data, a finite mixture of Bernoulli distributions can be used for modeling the response variables. We proposed an iterative maximum likelihood estimator for the regression coefficients that takes the matching probabilities into account. Next, the Bayesian counterpart...

متن کامل

Structure of Wavelet Covariance Matrices and Bayesian Wavelet Estimation of Autoregressive Moving Average Model with Long Memory Parameter’s

In the process of exploring and recognizing of statistical communities, the analysis of data obtained from these communities is considered essential. One of appropriate methods for data analysis is the structural study of the function fitting by these data. Wavelet transformation is one of the most powerful tool in analysis of these functions and structure of wavelet coefficients are very impor...

متن کامل

Bayesian Analysis of Censored Spatial Data Based on a Non-Gaussian Model

Abstract: In this paper, we suggest using a skew Gaussian-log Gaussian model for the analysis of spatial censored data from a Bayesian point of view. This approach furnishes an extension of the skew log Gaussian model to accommodate to both skewness and heavy tails and also censored data. All of the characteristics mentioned are three pervasive features of spatial data. We utilize data augme...

متن کامل

Investigation of Genetic Diversity and Structure Analysis of Different Citrus Genotypes Using ISSR Markers

In breeding programs, it is necessary having knowledge of the relatedness and genetic diversity in germplasm pools. The spread of cultivated regions and the high levels of production indicates citrus importance in the global economy. Therefore, 110 citrus genotypes were evaluated using 12 ISSR markers. Overall, 154 polymorphic bands were scored with an average of 12.8 alleles per primer. The po...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009